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ABSTRACT 



The authors discuss designing a test using a recently developed 
approach in item response theory. A brief review of terms and 
formulae is followed by two types of test design. The first method 
suggests selecting items using confidence envelopes. The second 
method suggests using item characteristic curves and their 
confidence intervals. Using test reliability as the criteria, the 
second method is preferred for test design in item response theory. 
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Item Response Theory 

The three parameter IRT model is used to estimate P<8) or the 
probability of a correct response to an item as follows: 

The parameter c represents the probability that an examinee 
completely lacking in ability will answer the item correctly. It is 
a guessing parameter or pseudo chance score level. If an item 
cannot be answered correctly by guessing, then c - 0. The 
parameter b represents a location parameter. It is the position of 
the curve along the ability scale or item difficulty. The more 
difficult the item, the further the curve is to the right. The 
logistic curve has its inflection point at 0 » b. When there is no 
guessing, b is the ability level where the probability of a correct 
answer is .5. When guessing occurs, b is the ability level where 
the probability of a correct answer is halfway between c and 1.00. 
The parameter a is proportional to the slope of the curve at the 
inflection point and equals 0.425 a (1 - £_) . It represents the 
discriminating power of the item or the degree to which the 
examinee response varies with ability level (Lord, 1980, p. 12-13) . 



Confidence Envelope 2 

The assumptions in item response theory are also of importance 
if estimating jeliability. A reasonable assumption is that P{0) 
increases as 8 increases. Another suggests that an examinee's 
ability (9) is all we need ir. order to determine the probability of 
success on a specific item. The assumption of local independence 
of items requires that any two items be uncorrelated when ability 
(Q) is fixed and follows directly from the assumption of 
unidimensionality for a test. Also, ability (9) is probably not 
normally distributed for most groups of examinees, 
unidimensionality however, is a property of the items and does not 
cease to exist because the distribution of ability for a group 
changes (Lord, 1980, p. 19-20). Our concern is that the values of 
a, b, and c lie within a 95 % confidence interval. 

A data set (Table 1) is used to illustrate our point (Wright 
and Stone, 1979, p. 31) . 



Insert Table 1 Here 



An item analysis (Table 2) was conducted to estimate a, b, and c 
parameters (MicroCAT, 198 6) . 



Insert Table 2 Here 
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Confidence Envelopes vs. Test Envelopes 

A recently developed approach to test design in item response 
theory proposed using confidence envelopes (Thissen and Wainer, 
1990) . Accordingly, the authors state: 

"Confidence envelopes provide a description of the sampling 
variation of item response curves in the space of the fitted 
functions. They can be used to give the data analyst a clear 
idea of the class of item response curves that are compatible 
with the data. M-line plots nay be used to show the width of 
the envelope, as well as the shapes and relative posterior 
density of the included curves." (p 126) 

Each item characteristic curve is visually examined to see if it 
fits into the confidence envelope. 

For example, given a ■ 45 degrees, b = 0 and c - .25 with 
difficulty ranging from + 1 9 to - 1 8, the plot of the item 
characteristic curve can be examined to see if it lies within the 
upper and lower boundaries of the specific confidence envelope. 
The upper and lower boundaries can be computed using a 3 -parameter 
IRT model (Lord and Pashiey, 1938) . Items for a completed test 
might appear as in Figure 1 and would include only those items 
selected within the upper and lower bounded confidence envelope. 



Insert Figure 1 Here 



Confidence Envelope 4 

Our approach uses the number of items or length of a test, L; 
the width of a test (Max 9 - Min 9) , W; and the average ability 
level of all examinees or height of a test, H. This approach 
models after known Rasch procedures (Wright and Stone , 1979) . 
For the data set provided, Max 9 = 3.803 and Min 9 « -2.995/ with 
W = 6.7 98 and H = 0. The optimum length is unknown. 

A test envelope refers to the area in a plot of ability (9) 
versus P (9) bounded by the item characteristic curve of the lowest 
ability expected to the item characteristic curve of the highest 
ability expected (Figure 2) . 



Insert Figure 2 Here 



The goal in test design would be to select item characteristic 
curves between the maximum and minimum ability (9) such that the 
item and it's confidence interval cover the area without overlap 
(Note: Each item has it's own respective confidence interval) . The 
item confidence interval can easily be computed using logistic 
regression (Hauck, 1983) . 

The authors derive the width of the confidence interval for a 
single item using a known three parameter IRT model procedure 
(Lord, 1980/ pp. 66-67) . We are however only interested in the 
width of the confidence interval (AB) at the point of inflection. 
For example, consider an item with a confidence interval around it 
at the point of inflection, b (Figure 3) . 
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Insert Figure 3 Here 



Derivation of AB 



Given, 



rar« * -S * * ( 1 ' 96 2 ^ » » . 425a ( l -c) , 

AS AS 



we can use a and c_ to determine the slope of the line at the point 

of inflection, b (Lord, 1980). 

Then, 



. 92 



.425.? 11 -C) 



And since, 
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Then, 
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Therefore, 

a AS 



The width of the confidence interval (the distance AB) describes 
the effectiveness of the test as a measure of ability (Lord, 1980, 
p. 66) . The AB distances for 14 items in the data are in Table 3. 



Insert Table 3 Here 



The maximum AB distance is .452. The optimum length of the test 
would then be derived by W divided by maximum AB (6.798 divided by 
.452) , or 16 items. The next step would involve computing the 
optimum b' s for a test of H = 0, W = 6.7 98, L - 16. This can be 
accomplisheu by using the following formula (Wright and Stone, 
1979, p. 140) : 

b = H + (W / 2) [ (L - 2i + 1) / L ] 
i 

The optimal item difficulties are in Table 4. 



Insert Table 4 Here 
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Test construction requires an item pool and with these 
optimum difficulties included in the item bank information, item 
selection would be straightforward. Thus, a test envelope can be 
created with item characteristic curve information and item 
confidence intervals, respectively, using item response theory. 
This approach should make the application of item response theory 
in test construction easier for the practitioner. 

Reliability of Methods 

The confidence envelope method and the test envelope method are 
evaluated based upon their reliability coefficients. Lord's 
equation for reliability is used (Lord, 1980, p. 52) : 



ff 



Fl «=1 in a-l 



7 AT 



EI 



1*1 a*l J " 



The confidence envelope method, where all of the items were 
selected with the same value of b matched to the persons 
ability, is presented in Table 5. The test envelope method, 
where all of the items were selected with differing values of b for 
one person of ability (0) , is presented in Table 6. 
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Insert 


Table 
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Here 




Insert 


Table 
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Here 



If all of the items were selected to fit into the same 
confidence envelope for a - 1 and b » 0 for all seven items (Table 
5) f then: 

r- 12.25-1.75 
1.75+12.25-1.75 

If all the items were selected with certain confidence intervals to 
fit into a test envelope using optimum b values (Table 6), then: 

r- 12.25-1.75 g4 
.58578*12.25-1.75 

Clearly, reliability increased because different non-overlapping 
items were selected to cover the range of ability measured (test 
envelope) . The test envelope method, where the optimum ability 
levels are matched from an item bank, should result in a more 
reliable test. 

ERIC 9 ll 
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Table 1: 35 examinee responses to 18 items 



Items 



•son 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


IB 


:.7 


18 


-k 


1 


1 


1 


1 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


2 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


3 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


4 


1 


1 


1 


1 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 
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1 


1 


1 


1 


1 


1 


1 
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1 
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0 


0 
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6 
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0 
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1 
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0 
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0 


1 


1 
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0 


0 
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1 
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31 


1 


1 


1 


1 
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1 
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0 
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32 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


0 


0 


0 


0 


0 
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33 


1 


1 


1 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


34 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


1 


0 


1 


0 


0 


0 


0 


35 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



(Wright and Stone, 197 9, p. 31) 
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Table 2: IRT parameter estimates for data 



Item a b c 



1 


item i 


deleted 








2 


item i 


deleted 








3 


item < 


deleted 








4 


1.547 


-1 . 


573 


0 


.240 


5 


1.475 


-1. 


298 


0 


.250 


6 


1.740 


-1. 


074 


0 


.250 


7 


1-578 


-1 . 


336 


0 


.240 


8 


1 .938 


-0. 


818 


0 


.180 


9 


1.787 


-1 . 


189 


0 


.220 


10 


1.801 


-0. 


161 


0 


.260 


11 


1.943 


0. 


757 


0 


.190 


12 


2.500 


3. 


000 


0 


.190 


13 


? 500 


3. 


000 


0 


.200 


14 


1.567 


3. 


000 


0 


.130 


15 


1. 622 


2. 


289 


0 


.030 


16 


1.622 


2. 


289 


0 


.030 


17 


1.622 


2. 


289 


0 


.030 


18 


item ■ 


deleted 









a 

(MicroCAT, 1988) 
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Table 3: Width (AB) of the confidence interval 
at the point of inflection b 





Width (AB) 


Item 


4 


.288 


5 


.367 


6 


.329 


7 


.324 


8 


.292 


9 


.290 


1U 


.452 


11 


.420 


12 


.302 


13 


.312 


14 


.389 


15 


.204 


16 


.204 


17 


.204 
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Table 4: Optimum b 

x 



Item b 

x 



1 


3.19 


2 


2.76 


3 


2.34 


4 


1.91 


5 


1.49 


6 


1.06 


7 


0.64 


8 


0.21 


9 


-0.21 


10 


-0.64 


11 


-1.06 


12 


-1.49 


13 


-1.91 


14 


-2.34 


15 


-2.76 


16 


-3.19 



a 

t (h=0,w=6. 798,1=16) 
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Table 5: 


Confidence 


envelope 


method 


b 


P (9) 


Q(0) 


P(9)Q(9) 


■ 

X 








0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


0 


.50 


.50 


.25 


Sum 






1.75 



a 

Assumes equal item ability, discrimination, 
and difficulty 
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Table 6: Test Envelope Method 



b 


p(e> 


Q(0) 


p(e)Q<9) 


i 








-3 


.99394 


.00606 


.00602 


-2 


.96770 


.03230 


.03126 


-1 


.84553 


.15447 


.13061 


0 


.50000 


.50000 


.25000 


1 


.15447 


.84553 


.13061 


2 


.03230 


.96770 


.03126 


3 


.00606 


.99394 


.00602 


Sum 






.58578 


a 

Assumes 


equal 


item ability and 


discrimination 
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Ability (0) 



Figure 1 . Confidence Envelope 
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Item 




P(8)|B 



Characteristic 
Curs* 



P(9)|8 



Ability (9) 

Figure 3 Confidence interval width for a single item 
at point of inflection b 
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